Loan Analysis (Prosper Provided) by Andrew Teixeira

Initial Data Loading and Exploration

The data, provided by Prosper and made available by Udacity through the Data Analyst program, is data on loans made through the Prosper platform, an online fintech company that offers peer-to-peer financing in which individuals can loan money to individual borrowers for specific purposes, a different means of financing from the traditional loans.

In the dataset, we can examine and explore the characteristics of a Prosper loan (Interest Rate, APR etc.) and a Prosper borrower (Credit Score etc.) that may make for an interesting insight into how the platform operates, finances itself, and is lucrative for investors.

knitr::opts_chunk$set(echo=FALSE, warning=FALSE, message=FALSE)

Load all relevant libraries

Establish working directory

## [1] "/Users/andrewteixeira/Documents/DAND/Final_Project_EDA"
## [1] "/Users/andrewteixeira/Documents/DAND/Final_Project_EDA"

Load the data into the variable “loans”

Examining the column names

##  [1] "ListingKey"                         
##  [2] "ListingNumber"                      
##  [3] "ListingCreationDate"                
##  [4] "CreditGrade"                        
##  [5] "Term"                               
##  [6] "LoanStatus"                         
##  [7] "ClosedDate"                         
##  [8] "BorrowerAPR"                        
##  [9] "BorrowerRate"                       
## [10] "LenderYield"                        
## [11] "EstimatedEffectiveYield"            
## [12] "EstimatedLoss"                      
## [13] "EstimatedReturn"                    
## [14] "ProsperRating..numeric."            
## [15] "ProsperRating..Alpha."              
## [16] "ProsperScore"                       
## [17] "ListingCategory..numeric."          
## [18] "BorrowerState"                      
## [19] "Occupation"                         
## [20] "EmploymentStatus"                   
## [21] "EmploymentStatusDuration"           
## [22] "IsBorrowerHomeowner"                
## [23] "CurrentlyInGroup"                   
## [24] "GroupKey"                           
## [25] "DateCreditPulled"                   
## [26] "CreditScoreRangeLower"              
## [27] "CreditScoreRangeUpper"              
## [28] "FirstRecordedCreditLine"            
## [29] "CurrentCreditLines"                 
## [30] "OpenCreditLines"                    
## [31] "TotalCreditLinespast7years"         
## [32] "OpenRevolvingAccounts"              
## [33] "OpenRevolvingMonthlyPayment"        
## [34] "InquiriesLast6Months"               
## [35] "TotalInquiries"                     
## [36] "CurrentDelinquencies"               
## [37] "AmountDelinquent"                   
## [38] "DelinquenciesLast7Years"            
## [39] "PublicRecordsLast10Years"           
## [40] "PublicRecordsLast12Months"          
## [41] "RevolvingCreditBalance"             
## [42] "BankcardUtilization"                
## [43] "AvailableBankcardCredit"            
## [44] "TotalTrades"                        
## [45] "TradesNeverDelinquent..percentage." 
## [46] "TradesOpenedLast6Months"            
## [47] "DebtToIncomeRatio"                  
## [48] "IncomeRange"                        
## [49] "IncomeVerifiable"                   
## [50] "StatedMonthlyIncome"                
## [51] "LoanKey"                            
## [52] "TotalProsperLoans"                  
## [53] "TotalProsperPaymentsBilled"         
## [54] "OnTimeProsperPayments"              
## [55] "ProsperPaymentsLessThanOneMonthLate"
## [56] "ProsperPaymentsOneMonthPlusLate"    
## [57] "ProsperPrincipalBorrowed"           
## [58] "ProsperPrincipalOutstanding"        
## [59] "ScorexChangeAtTimeOfListing"        
## [60] "LoanCurrentDaysDelinquent"          
## [61] "LoanFirstDefaultedCycleNumber"      
## [62] "LoanMonthsSinceOrigination"         
## [63] "LoanNumber"                         
## [64] "LoanOriginalAmount"                 
## [65] "LoanOriginationDate"                
## [66] "LoanOriginationQuarter"             
## [67] "MemberKey"                          
## [68] "MonthlyLoanPayment"                 
## [69] "LP_CustomerPayments"                
## [70] "LP_CustomerPrincipalPayments"       
## [71] "LP_InterestandFees"                 
## [72] "LP_ServiceFees"                     
## [73] "LP_CollectionFees"                  
## [74] "LP_GrossPrincipalLoss"              
## [75] "LP_NetPrincipalLoss"                
## [76] "LP_NonPrincipalRecoverypayments"    
## [77] "PercentFunded"                      
## [78] "Recommendations"                    
## [79] "InvestmentFromFriendsCount"         
## [80] "InvestmentFromFriendsAmount"        
## [81] "Investors"
##                ListingKey ListingNumber           ListingCreationDate
## 1 1021339766868145413AB3B        193129 2007-08-26 19:09:29.263000000
## 2 10273602499503308B223C1       1209647 2014-02-27 08:28:07.900000000
## 3 0EE9337825851032864889A         81716 2007-01-05 15:00:47.090000000
## 4 0EF5356002482715299901A        658116 2012-10-22 11:02:35.010000000
## 5 0F023589499656230C5E3E2        909464 2013-09-14 18:38:39.097000000
## 6 0F05359734824199381F61D       1074836 2013-12-14 08:26:37.093000000
##   CreditGrade Term LoanStatus          ClosedDate BorrowerAPR BorrowerRate
## 1           C   36  Completed 2009-08-14 00:00:00     0.16516       0.1580
## 2               36    Current                         0.12016       0.0920
## 3          HR   36  Completed 2009-12-17 00:00:00     0.28269       0.2750
## 4               36    Current                         0.12528       0.0974
## 5               36    Current                         0.24614       0.2085
## 6               60    Current                         0.15425       0.1314
##   LenderYield EstimatedEffectiveYield EstimatedLoss EstimatedReturn
## 1      0.1380                      NA            NA              NA
## 2      0.0820                 0.07960        0.0249         0.05470
## 3      0.2400                      NA            NA              NA
## 4      0.0874                 0.08490        0.0249         0.06000
## 5      0.1985                 0.18316        0.0925         0.09066
## 6      0.1214                 0.11567        0.0449         0.07077
##   ProsperRating..numeric. ProsperRating..Alpha. ProsperScore
## 1                      NA                                 NA
## 2                       6                     A            7
## 3                      NA                                 NA
## 4                       6                     A            9
## 5                       3                     D            4
## 6                       5                     B           10
##   ListingCategory..numeric. BorrowerState    Occupation EmploymentStatus
## 1                         0            CO         Other    Self-employed
## 2                         2            CO  Professional         Employed
## 3                         0            GA         Other    Not available
## 4                        16            GA Skilled Labor         Employed
## 5                         2            MN     Executive         Employed
## 6                         1            NM  Professional         Employed
##   EmploymentStatusDuration IsBorrowerHomeowner CurrentlyInGroup
## 1                        2                True             True
## 2                       44               False            False
## 3                       NA               False             True
## 4                      113                True            False
## 5                       44                True            False
## 6                       82                True            False
##                  GroupKey              DateCreditPulled
## 1                         2007-08-26 18:41:46.780000000
## 2                                   2014-02-27 08:28:14
## 3 783C3371218786870A73D20 2007-01-02 14:09:10.060000000
## 4                                   2012-10-22 11:02:32
## 5                                   2013-09-14 18:38:44
## 6                                   2013-12-14 08:26:40
##   CreditScoreRangeLower CreditScoreRangeUpper FirstRecordedCreditLine
## 1                   640                   659     2001-10-11 00:00:00
## 2                   680                   699     1996-03-18 00:00:00
## 3                   480                   499     2002-07-27 00:00:00
## 4                   800                   819     1983-02-28 00:00:00
## 5                   680                   699     2004-02-20 00:00:00
## 6                   740                   759     1973-03-01 00:00:00
##   CurrentCreditLines OpenCreditLines TotalCreditLinespast7years
## 1                  5               4                         12
## 2                 14              14                         29
## 3                 NA              NA                          3
## 4                  5               5                         29
## 5                 19              19                         49
## 6                 21              17                         49
##   OpenRevolvingAccounts OpenRevolvingMonthlyPayment InquiriesLast6Months
## 1                     1                          24                    3
## 2                    13                         389                    3
## 3                     0                           0                    0
## 4                     7                         115                    0
## 5                     6                         220                    1
## 6                    13                        1410                    0
##   TotalInquiries CurrentDelinquencies AmountDelinquent
## 1              3                    2              472
## 2              5                    0                0
## 3              1                    1               NA
## 4              1                    4            10056
## 5              9                    0                0
## 6              2                    0                0
##   DelinquenciesLast7Years PublicRecordsLast10Years
## 1                       4                        0
## 2                       0                        1
## 3                       0                        0
## 4                      14                        0
## 5                       0                        0
## 6                       0                        0
##   PublicRecordsLast12Months RevolvingCreditBalance BankcardUtilization
## 1                         0                      0                0.00
## 2                         0                   3989                0.21
## 3                        NA                     NA                  NA
## 4                         0                   1444                0.04
## 5                         0                   6193                0.81
## 6                         0                  62999                0.39
##   AvailableBankcardCredit TotalTrades TradesNeverDelinquent..percentage.
## 1                    1500          11                               0.81
## 2                   10266          29                               1.00
## 3                      NA          NA                                 NA
## 4                   30754          26                               0.76
## 5                     695          39                               0.95
## 6                   86509          47                               1.00
##   TradesOpenedLast6Months DebtToIncomeRatio    IncomeRange
## 1                       0              0.17 $25,000-49,999
## 2                       2              0.18 $50,000-74,999
## 3                      NA              0.06  Not displayed
## 4                       0              0.15 $25,000-49,999
## 5                       2              0.26      $100,000+
## 6                       0              0.36      $100,000+
##   IncomeVerifiable StatedMonthlyIncome                 LoanKey
## 1             True            3083.333 E33A3400205839220442E84
## 2             True            6125.000 9E3B37071505919926B1D82
## 3             True            2083.333 6954337960046817851BCB2
## 4             True            2875.000 A0393664465886295619C51
## 5             True            9583.333 A180369302188889200689E
## 6             True            8333.333 C3D63702273952547E79520
##   TotalProsperLoans TotalProsperPaymentsBilled OnTimeProsperPayments
## 1                NA                         NA                    NA
## 2                NA                         NA                    NA
## 3                NA                         NA                    NA
## 4                NA                         NA                    NA
## 5                 1                         11                    11
## 6                NA                         NA                    NA
##   ProsperPaymentsLessThanOneMonthLate ProsperPaymentsOneMonthPlusLate
## 1                                  NA                              NA
## 2                                  NA                              NA
## 3                                  NA                              NA
## 4                                  NA                              NA
## 5                                   0                               0
## 6                                  NA                              NA
##   ProsperPrincipalBorrowed ProsperPrincipalOutstanding
## 1                       NA                          NA
## 2                       NA                          NA
## 3                       NA                          NA
## 4                       NA                          NA
## 5                    11000                      9947.9
## 6                       NA                          NA
##   ScorexChangeAtTimeOfListing LoanCurrentDaysDelinquent
## 1                          NA                         0
## 2                          NA                         0
## 3                          NA                         0
## 4                          NA                         0
## 5                          NA                         0
## 6                          NA                         0
##   LoanFirstDefaultedCycleNumber LoanMonthsSinceOrigination LoanNumber
## 1                            NA                         78      19141
## 2                            NA                          0     134815
## 3                            NA                         86       6466
## 4                            NA                         16      77296
## 5                            NA                          6     102670
## 6                            NA                          3     123257
##   LoanOriginalAmount LoanOriginationDate LoanOriginationQuarter
## 1               9425 2007-09-12 00:00:00                Q3 2007
## 2              10000 2014-03-03 00:00:00                Q1 2014
## 3               3001 2007-01-17 00:00:00                Q1 2007
## 4              10000 2012-11-01 00:00:00                Q4 2012
## 5              15000 2013-09-20 00:00:00                Q3 2013
## 6              15000 2013-12-24 00:00:00                Q4 2013
##                 MemberKey MonthlyLoanPayment LP_CustomerPayments
## 1 1F3E3376408759268057EDA             330.43            11396.14
## 2 1D13370546739025387B2F4             318.93                0.00
## 3 5F7033715035555618FA612             123.32             4186.63
## 4 9ADE356069835475068C6D2             321.45             5143.20
## 5 36CE356043264555721F06C             563.97             2819.85
## 6 874A3701157341738DE458F             342.37              679.34
##   LP_CustomerPrincipalPayments LP_InterestandFees LP_ServiceFees
## 1                      9425.00            1971.14        -133.18
## 2                         0.00               0.00           0.00
## 3                      3001.00            1185.63         -24.20
## 4                      4091.09            1052.11        -108.01
## 5                      1563.22            1256.63         -60.27
## 6                       351.89             327.45         -25.33
##   LP_CollectionFees LP_GrossPrincipalLoss LP_NetPrincipalLoss
## 1                 0                     0                   0
## 2                 0                     0                   0
## 3                 0                     0                   0
## 4                 0                     0                   0
## 5                 0                     0                   0
## 6                 0                     0                   0
##   LP_NonPrincipalRecoverypayments PercentFunded Recommendations
## 1                               0             1               0
## 2                               0             1               0
## 3                               0             1               0
## 4                               0             1               0
## 5                               0             1               0
## 6                               0             1               0
##   InvestmentFromFriendsCount InvestmentFromFriendsAmount Investors
## 1                          0                           0       258
## 2                          0                           0         1
## 3                          0                           0        41
## 4                          0                           0       158
## 5                          0                           0        20
## 6                          0                           0         1

Univariate Plots Section

We will be looking at a few variables that relate to both the loan itself (dollar amount, term) and the borrower (credit score, prosper rating) in order to determine what is likely to influence the total amount Prosper is willing to lend a borrower and at what rate.

Borrower Data Exploration: Subsection: Loan Original Amount

We will be examining the LoanOriginalAmount data from the loans dataset to better understand the frequency of loan values provided by Prosper.

## 
##  4000 15000 10000  5000  2000  3000 25000 20000  1000  2500  7500  7000 
## 14333 12407 11106  6990  6067  5749  3630  3291  3206  2992  2975  2949 
##  6000  3500  8000 12000  9000 13000  1500  4500 
##  2869  2567  2442  1921  1695  1509  1507  1406
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6500    8337   12000   35000

The table above shows that loans of 4,000 are the most provided by Propser. This value is below the median of 6500 and the Mean of 8337, which suggests there are some large loans that are pulling the mean forward.

The max loan provided by Prosper was 35,000.

The histogram above shows a spike at around 3,000- 4,000. This suggests there are a great number of loans provided in this area. Let’s examine further.

## 
##  4000 15000 10000  5000  2000  3000 25000 20000  1000  2500  7500  7000 
## 14333 12407 11106  6990  6067  5749  3630  3291  3206  2992  2975  2949 
##  6000  3500  8000 12000  9000 13000  1500  4500 
##  2869  2567  2442  1921  1695  1509  1507  1406

Borrower Data Exploration: Subsection: Loan Term

Let’s look at the loan term provided by Prosper to get a sense of what types of loans and duration of loans Prosper gives to their borrowers.

## 
##    12    60    36 
##  1614 24545 87778
## 
##    12    36    60 
##  1614 87778 24545

From the table and histogram above, we see that Prosper only gives loans in 12, 36, and 60 month terms. The majority of loans are provided on a 36 month term payment plan.

Borrower Data Exploration: Subsection: Amount Delinquent

We will examine the amount of money that is delinquent per borrower per Prosper loan.

## 
##  80  35  70 200  10  20 130  67 150  15  55  65 120  40  60  75  30  25 
##  28  29  29  29  30  30  31  32  32  34  34  34  35  43  46  56  65  66 
## 100  50 
##  67  73

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max.     NA's 
##      0.0      0.0      0.0    984.5      0.0 463881.0     7622
## 
##     0    50   100    25    30    75    60    40   120    15    55    65 
## 89818    73    67    66    65    56    46    43    35    34    34    34 
##    67   150   130    10    20    35    70   200    80    33    45    54 
##    32    32    31    30    30    29    29    29    28    27    26    26 
##    79   110   300    36    51    57    68    88    90   105   134    32 
##    26    26    26    25    25    25    25    25    25    25    25    24 
##    48    52    84   230    53    78   101   115   158   350    31    37 
##    24    24    24    24    23    23    23    23    23    23    22    22 
##    42    64 
##    22    22

The data above in the histograms and table shows that the majority of Prosper borrowers have 0 dollars in delinquency, a good sign if you are a lender.

Borrower Data Exploration: Subsection: Delinquencies Last 7 Years

Examining the Delinquencies Last 7 Years variable will help us understand the type of borrowers that are included in this dataset. DelinquenciesLast7Years may provide a glimpse into the reliability of borrowers in this set.

## 
##     0     1     3     2     4     5     6     7     8     9    10    11 
## 76439  3967  3183  2879  2592  1826  1790  1648  1421  1208  1151  1075 
##    12    13    14    15    16    17    18    20 
##   982   873   821   795   731   608   574   565
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.000   0.000   4.155   3.000  99.000     990

Findings : Unsurprisingly, we see the vast majority of delinquent loans in the last 7 years for borrowers at 0. The median reflects this by also being 0, however, a mean of 4.155 shows that there are some potentially large amounts of delinquencies in the last 7 years per borrowers that may be driving this value up.

In the table, we see that there are, in fact, 110 borrowers who have 99 delinquencies in the past 7 years. This appears to be a strange lift from 98 (3 total borrowers). Is this a maximum value? Could it be human error?

Findings: A simple histogram shows that, in fact, the majority of delinquencies in last 7 years for borrowers were negatively skewed, unsurprisingly considering that banks and lenders are in the business of ensuring their loans are paid back. Let’s examine and plot that surreptitious the DelinquenciesLast7Years on the larger end of the plot.

Findings The histogram from DelinquenciesLast7Years > 79 show that the #99 is a strange occurance that may be artifically driving the mean up for the total DelinquenciesLast7Years. We will attempt to determine, later in this analysis, what makes an individual become delinquent with some multivariate tests and plots.

Borrower Data Exploration: Subsection: BorrowerAPR

We are getting a better sense of the types of borrowers Prosper loans money to. Now, we will examine the BorrowerAPR to determine how Prosper makes $ of their customers.

The findings above dictate a few peaks: Once between .15 and .20, again at .3, and once more at around .36.

Let’s create a sorted table to examine these trends.

## 
## 0.35797 0.35643 0.37453 0.30532  0.2951 0.35356 0.29776 0.15833 0.24246 
##    3672    1644    1260     902     747     721     707     652     605 
## 0.24758 0.12528 0.17359 0.15324 0.27462 0.27285 0.13799 0.15713 0.12691 
##     601     559     549     547     534     506     489     482     456 
## 0.25781 0.20735 
##     444     433
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
## 0.00653 0.15629 0.20976 0.21883 0.28381 0.51229      25

Interestingly, we see that Prosper’s most commonly issued APR is .3672, this is well above the mean of .21976 and the median of .20976.

Let’s remove all NA’s from APR for analysis - adding the Mean line, we can examine how many values are falling above or below the mean.

## [1] 0.2188277

We see a number of peaks and valleys within the BorrowerAPR data, which suggests there are certain integers that Prosper commonly issues when it comes to BorrowerAPR.

Borrower Data Exploration: Subsection: Income Range

##             $0      $1-24,999      $100,000+ $25,000-49,999 $50,000-74,999 
##            621           7274          17337          32192          31050 
## $75,000-99,999  Not displayed   Not employed 
##          16916           7741            806
##             $0      $1-24,999      $100,000+ $25,000-49,999 $50,000-74,999 
##            621           7274          17337          32192          31050 
## $75,000-99,999  Not displayed   Not employed 
##          16916              0            806

Findings Overall, the income of borrowers is failing between 25,000 and 74,999.

## 
##    CA    TX    NY    FL    IL          GA    OH    MI    VA    NJ    NC 
## 14717  6842  6729  6720  5921  5515  5008  4197  3593  3278  3097  3084 
##    WA    PA    MD    MO    MN    MA    CO    IN    AZ    WI    OR    TN 
##  3048  2972  2821  2615  2318  2242  2210  2078  1901  1842  1817  1737 
##    AL    CT    SC    NV    KS    KY    OK    LA    UT    AR    MS    NE 
##  1679  1627  1122  1090  1062   983   971   954   877   855   787   674 
##    ID    NH    NM    RI    HI    WV    DC    MT    DE    VT    AK    SD 
##   599   551   472   435   409   391   382   330   300   207   200   189 
##    IA    WY    ME    ND 
##   186   150   101    52

From the table above, we can see that California has the largeset amount of users on Prosper with 14,717 borrowers, while North Dakota has the least with 52 borrowers.

Borrower Subsection: TotalInquiries

## 
##     2     3     1     4     5     0     6     7     8     9    10    11 
## 14887 13934 13785 12148 10098  8430  7607  6171  4692  3779  2914  2431 
##    12    13    14    15    16    17    18    19 
##  1786  1453  1245   978   864   724   581   539

The data above show that most borrowers have very few inquiries on their credit, potentially showing that Prosper selects those who are steadfast and committed to paying down their debts. Let’s create a plot in which the number 0 is removed, showing only borrowers with 0 inquiries

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   4.000   6.036   7.000 379.000

From the plot above, we see a steep decline in total inquiries values after the initial 1,2 or 3 total inquiries.

Given that the 10 value appears to be an anomaly, let’s run a summary on Debt to Income Ratio

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   0.000   0.140   0.220   0.276   0.320  10.010    8554

Let’s further examine Debt to Income Ratios that are above the 3rd Quartile.

## 
##    0.33    0.35    0.34    0.36    0.37    0.38    0.39     0.4    0.41 
##    1902    1812    1810    1523    1399    1364    1226    1109     995 
##    0.42    0.43    0.44    0.45    0.46    0.47    0.48    0.49     0.5 
##     893     851     753     697     634     617     524     470     447 
##    0.51    0.52    0.54    0.53    0.55   10.01    0.56    0.57    0.59 
##     431     372     341     320     279     272     244     229     182 
##    0.58     0.6    0.61    0.62    0.63    0.65    0.64    0.66    0.67 
##     166     149     140     120     108     105      91      78      77 
##    0.68    0.71     0.7    0.69    0.72    0.75    0.73    0.76    0.74 
##      72      67      64      58      58      51      43      43      40 
##    0.78    0.77    0.86    0.87    0.81    0.84    0.85    0.79    0.83 
##      38      35      29      29      27      27      27      26      26 
##     0.8    0.82    0.88    0.89     0.9    0.91    0.94    0.96    0.97 
##      24      22      22      22      20      19      19      19      18 
##    0.92    0.93       1    0.98    0.99    1.02    0.95    1.01    1.05 
##      15      14      14      13      13      12      11      11      11 
##    1.06    1.07    1.03    1.19    1.21    1.04    1.12    1.16    1.08 
##      11      11      10      10      10       8       8       8       7 
##     1.1    1.27    1.14    1.15    1.18    1.22    1.49     1.2    1.25 
##       7       7       6       6       6       6       6       5       5 
##    1.31    1.56    1.09    1.11    1.24    1.32    1.36    1.47    1.54 
##       5       5       4       4       4       4       4       4       4 
##     2.2    1.13    1.29    1.34    1.39    1.41    1.42    1.45    1.46 
##       4       3       3       3       3       3       3       3       3 
##     1.6    1.75    1.79    1.81    1.87    1.89    1.99     2.3    2.38 
##       3       3       3       3       3       3       3       3       3 
##    2.45    2.72    3.49    3.92    1.17    1.23    1.26    1.28     1.3 
##       3       3       3       3       2       2       2       2       2 
##    1.37    1.38     1.5    1.51    1.52    1.55    1.57    1.58    1.61 
##       2       2       2       2       2       2       2       2       2 
##    1.65    1.72    1.74     1.8    1.82    1.93    1.94    1.98    2.02 
##       2       2       2       2       2       2       2       2       2 
##    2.07     2.1    2.19    2.34     2.5     2.6    2.65    2.79    2.95 
##       2       2       2       2       2       2       2       2       2 
##    2.99    3.23    3.27    3.53    3.64    3.73    3.81     4.4    4.43 
##       2       2       2       2       2       2       2       2       2 
##    4.71     4.9    5.36    5.56    7.27 0.32007   0.321 0.32388 0.33348 
##       2       2       2       2       2       1       1       1       1 
## 0.34443 0.34452 0.35155 0.35333  0.3543  0.3555 0.35972 0.36006 0.36232 
##       1       1       1       1       1       1       1       1       1 
## 0.36331 0.37002 0.37772 0.38361  0.3838 0.38587 0.39181 0.39531 0.39846 
##       1       1       1       1       1       1       1       1       1 
## 0.40466 0.40475 0.40572  0.4137 0.41586 0.42965 0.42972 0.43886  0.4413 
##       1       1       1       1       1       1       1       1       1 
## 0.46403   0.499 0.50136 0.50872 0.51504 0.54433 0.56576 0.57091 0.60618 
##       1       1       1       1       1       1       1       1       1 
## 0.78342    1.35     1.4    1.43    1.44    1.53    1.59    1.62    1.64 
##       1       1       1       1       1       1       1       1       1 
##    1.66    1.67    1.68    1.69     1.7    1.73    1.76    1.78    1.85 
##       1       1       1       1       1       1       1       1       1 
##    1.86     1.9    1.92    1.95    1.96    2.05    2.08    2.09    2.11 
##       1       1       1       1       1       1       1       1       1 
##    2.14    2.16    2.18    2.21    2.22    2.23    2.25 2.27329    2.28 
##       1       1       1       1       1       1       1       1       1 
##    2.31    2.35    2.43    2.47    2.48    2.49    2.53    2.55    2.57 
##       1       1       1       1       1       1       1       1       1 
##    2.59    2.63    2.66    2.67     2.7    2.74    2.83    2.86    2.94 
##       1       1       1       1       1       1       1       1       1 
##    2.97    3.07    3.09    3.11    3.14    3.17    3.19    3.22    3.26 
##       1       1       1       1       1       1       1       1       1 
##    3.28    3.29    3.32    3.38    3.39    3.45    3.54    3.58    3.59 
##       1       1       1       1       1       1       1       1       1 
##    3.66    3.68    3.71    3.76    3.77    3.86       4    4.03    4.04 
##       1       1       1       1       1       1       1       1       1 
##    4.13    4.14    4.15    4.16    4.19    4.21    4.27    4.29    4.32 
##       1       1       1       1       1       1       1       1       1 
##    4.33    4.37    4.42    4.44    4.54    4.58    4.66    4.68    4.75 
##       1       1       1       1       1       1       1       1       1 
##    4.76    4.78    4.84    4.85    4.86    4.89    5.02    5.05    5.06 
##       1       1       1       1       1       1       1       1       1 
##     5.1    5.15    5.16    5.18    5.21    5.23    5.26    5.29    5.31 
##       1       1       1       1       1       1       1       1       1 
##    5.34    5.38    5.55  5.5608    5.59    5.64    5.65    5.67    5.69 
##       1       1       1       1       1       1       1       1       1 
##    5.76    5.98    6.01    6.05    6.06    6.07     6.2    6.37    6.49 
##       1       1       1       1       1       1       1       1       1 
##    6.51    6.66    6.68    6.85    6.95    6.96    7.06    7.22    7.42 
##       1       1       1       1       1       1       1       1       1 
##    7.47    7.52    7.59    7.79     7.9    7.99    8.06    8.13    8.27 
##       1       1       1       1       1       1       1       1       1 
##    8.52    8.53    8.63    9.06     9.2    9.44    9.77 
##       1       1       1       1       1       1       1

Findings From the data above, that there are a few loans at various price points that have obscenely high debt to income ratios. Perhaps these folks have a great deal of collateral to put up. Let’s look at all the DTI ratios under or equal to 1.

From the chart above, we see that the majority of Debt-to-Income ratios are falling under .5, no surprise as the 3rd quartile value is .32.

The graph above shows that most borrowers are “employed” or categorized as “full-time”

Let’s examine the credit scores of applicant’s by creating a new variable that takes the mean of credit score upper and credit score lower. We will, when creating a histogram, start the x-axis at 300 to avoid lower values that are outliers or have omitted values.

From the above, we can see that most credit scores fall within 650 and 700. Let’s bucket those credit scores to create a categorical variable for later analysis.

When we look at the bucket bar chart below, we see there are a nearly equal amount of loans provided to average credit scores of between 600-700 and 700-800, with very few happening below or above.

Univariate Analysis

What is the structure of your dataset?

The structure of the dataset includes 113,937 observations of 81 variables, meaning there are 113,937 individual loans oberved. We have created two new variables to better examine credit score. avgCreditScore takes the average of the upper and lower credit score to take the mean for each individual borrower. The new variable CSBucket is a categorical variable that will help determine how credit may influence loans.

What is/are the main feature(s) of interest in your dataset?

The main features of interest for me in the dataset include: Prosper Score: What factors contribute most, or have the highest correlation, to establshing a Prosper rating.

Also, is the LoanOriginalAmount for highly related to Income Range and credit score?

What other features in the dataset do you think will help support your
investigation into your feature(s) of interest?

I am also interested in what levels of credit scores influence the eventual borrower APR. Are there distinct patterns between the credit worthiness of a borrower and whether they get a loan?

Did you create any new variables from existing variables in the dataset?

Yes, I created the avgCreditScore variable to determine what number lies between the lower and upper credit score for each user. Then, I used the cut() function in order to bin the average credit scores per user.

Of the features you investigated, were there any unusual distributions?
Did you perform any operations on the data to tidy, adjust, or change the form

of the data? If so, why did you do this?

I had removed na’s and blank values from a number of individual datapoints in order to better run analysis such as bar graphs.

Bivariate Plots Section

Loan original amount by Prosper Rating

## 
##     1     2     3     4     5     6     7 
##  6935  9795 14274 18345 15581 14551  5372

From the box plot above, we see that loans with Prosper Ratings of A, AA, and BB have much higher quartiles, outliers, and loan original amounts, which makes sense given that Prosper Ratings of C, D, E, HR are of lesser quality according to the Prosper website.

Let’s limit the ylim to further examine the median of the loan categories.

From the zoomed in data, we can see that AA loans have the highest median of any category at slightly above 10,000. Let’s run a table to see what the median # actually is.

## loans$ProsperRating..Alpha.: 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    2500    4500    6159    7904   25000 
## -------------------------------------------------------- 
## loans$ProsperRating..Alpha.: A
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    5850   10000   11460   15000   35000 
## -------------------------------------------------------- 
## loans$ProsperRating..Alpha.: AA
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    6000   10940   11584   16000   35000 
## -------------------------------------------------------- 
## loans$ProsperRating..Alpha.: B
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    6000   10000   11622   15000   35000 
## -------------------------------------------------------- 
## loans$ProsperRating..Alpha.: C
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    5000   10000   10392   15000   25000 
## -------------------------------------------------------- 
## loans$ProsperRating..Alpha.: D
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6100    7083   10000   15000 
## -------------------------------------------------------- 
## loans$ProsperRating..Alpha.: E
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    3600    4000    4586    5000   15900 
## -------------------------------------------------------- 
## loans$ProsperRating..Alpha.: HR
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    3000    4000    3463    4000   16800

The table above shows AA ratings with the median of 10,940, the highest of any category.

Loan original amount by Delinquencies Last 7 Years

We’ve examined the loan original amount from the Prosper Rating, let’s see if borrower delinquencies have any effect on the loan original amount.

The plot above shows some obvious levels to the data, as Propser seems to provide loans in 5,000 increments, however, we do see a slight pattern in which, when Delinquencies in the last 7 years rise, the total amount of a loan often falls.

Let’s add transparency and jitter in order to account for overplotting.

The above chart, with it’s dark line on the y-axis at 0, shows that Prosper very much likes to provide loans to those with little to no delinquencies in the past 7 years. There are a few points at 99 in the chart, which may be an anomoly or human error, and is worth further questioning of Prosper’s data capture.

Loan original amount by Employment Status

Building on the picture we are beginning to see with regards to loan original amount, let’s examine Employment status and loan original amount.

##                    Employed     Full-time Not available  Not employed 
##          2255         67322         26355          5347           835 
##         Other     Part-time       Retired Self-employed 
##          3806          1088           795          6134

In this data, I will create a new DF called loans_by_employmentstats as there are 2255 borrowers in which the status is blank, you can see this in the table above.

The boxplot above shows that employed Prosper borrowers, either through a company or self-employed, have higher median loan amounts than other categories, including Full-time and Part-time. This is odd, as employed persons are logically either employed full-time or part-time or self-employed. This is a question for Prosper as to what the classification methodology is.

By placing the overall median line on the graph, we see that only the categories of “Employed” and “self-employed” actually have a higher median amount than the overall median, with the other categories falling behind.

Loan Original Amount by Estimated Return

We’ve got a sense of the borrowers, now let’s take a look at the loan itself by examining the loan original amount and its estimated return.

The graph above, with slight transparency and jitter, show that the estimated return for most loans is falling around .05 and 1.5. Interestingly, there are a fair amount of loans that have negative returns, much more than those that have estimated returns of .2 or more.

A Pearson correlation test below shows a negative correlation of -.28. I would have predicted greater correlation, however, the negative number suggests as the loan original amount increases, the estimated return decreases, a common phenomenmon.

## 
##  Pearson's product-moment correlation
## 
## data:  loans$EstimatedReturn and loans$LoanOriginalAmount
## t = -86.98, df = 84851, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2922833 -0.2799279
## sample estimates:
##        cor 
## -0.2861175

Loan original amount by Investors

## [1] 80.47523

By taking the mean amount of investors per loan, which is calculated to around 80, we can begin to see how many people it actually takes to get a Prosper loan fully-funed. A scatterplot can visualize this phenomeon.

The scatter plot above shows that, as loan amounts grow, so do the total amount of investors needed. The mean of just above 80 shows that the majority of loans are falling at a similar price level, thus a similar amount of investors are needed for analogous loans.

A correlation below shows a Pearson score of .38, lesser than what I expected.

## 
##  Pearson's product-moment correlation
## 
## data:  loans$Investors and loans$LoanOriginalAmount
## t = 138.71, df = 113940, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.3751140 0.3850494
## sample estimates:
##       cor 
## 0.3800926

Loan original amount by Credit Score

We will, finally, use our newly binned data to examine loan original amount by credit score bucket. This will provide a nice picture as to which credit score buckets have the highest median Loan Original amount.

Unsurprsiingly, the buckets of 700-800 and 800-900 both have medians that exceed the overall median. It is interesting to note that 600-700 credit scores have a median Loan Original Amount less than the overall median, an indication credit scores in this area are not deemed as worthy for a loan.

Bivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. How did the feature(s) of interest vary with other features in
the dataset?

Most interestingly, I discovered that a credit score of 600-700 has a substantially lower median loan original amount than credit scores of 700-800. This was surprising to me, as I suspected a slight decrease but, personally, have always considered a credit score of 600-700 to be somewhat trustworthy.

Also, the total amount of investors per loan was quite intriguing, as there were substantial outliers from the mean of 80 per loan. We will further examine this by exploring categories that may influence that in the multi-variate analysis portion.

Did you observe any interesting relationships between the other features
(not the main feature(s) of interest)?

Yes, the relationship between employment status was particularly interesting, as “full time” employees have a signficiantly lower median loan amount than those simply categorized as “employed”. I suspect there may be an error in Prosper’s classification system.

What was the strongest relationship you found?

The relationship between total investors and dollar amount had a Pearson score of .38, the strongest relationship I uncovered.

Interestingly, loan original amount and estimated return were negatively correlated, showing that as a loan amount is greater, estimated returns are generally expected to fail.

Multivariate Plots Section

Examining loan original amound and borrower APR by Credit Score Bucket.

The above scatterplot shows that csBucket 800-900 are largely consolidated at a little less than .1 for their Borrower APR, despite loan amount. This shows that Prosper generally loans funds to those with good credit at various levels of dollar amount.

You can see distinct, almost vertical patterns of increased BorrowerAPR based on their credit score. Lower credit scores from 400-600 are largely consolidated lower on the y-axis at varying APRs, showing that credit scores that are lesser do not get loaned as large amounts as other credit scores.

Examining loan original amount and estimated return by Credit Score Bucket.

Similarly to our analysis on borrower apr, estimated returns are largely attributed to certain credit scores. The highest end of credit scores (800-900), have a distinct Estimated Return of around .05 percent, with 700-800 categories at around .075 to .1, and 600-700 following thereafter with growing estimated returns. This could be because those with higher credit scores pay their loans off quicker, at lower rates.

Interestingly, there are a bevy of loans at the 600-700 credit score level that do not pay off loans that are less than 10,000 in total value. Even though they have credit scores similarly to those borrowing at higher dollar amounts, they were unable to pay their comparable smaller debts.

Loan Original Amount, # of Investors, Credit Score

The graph above shows that the total amount of investors increases as the loan original amount increases, and invariably, the higher credit score users are more adept at getting additional investors. In the scatterplot, you hardly see any 400-700 credit score borrowers beyond the overall median line.

Interestingly, the higher end credit score borrowers of 800-900 generally have a # of investors per loan that is higher than the median and higher than there lesser counterparts, even twhen the loan original amount is the same, this means more individuals are investing smaller amounts per individual loan.

Loan Original Amount, Loan Investors, Prosper Rating

We run the tables below to view which Prosper Rating has the most loans and the highest loan amounts, we find that a score of 7 receives the highest median loan and a score of 1-2 have the lowest medians. This may be an indication of the loan worthiness of the borrowers - thus, we assue 7 to be the best loan for chance of repayment and 1 being the worse.

## 
##     1     2     3     4     5     6     7 
##  6935  9795 14274 18345 15581 14551  5372
## loans$ProsperRating..numeric.: 1
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    3000    4000    3463    4000   16800 
## -------------------------------------------------------- 
## loans$ProsperRating..numeric.: 2
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    3600    4000    4586    5000   15900 
## -------------------------------------------------------- 
## loans$ProsperRating..numeric.: 3
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    4000    6100    7083   10000   15000 
## -------------------------------------------------------- 
## loans$ProsperRating..numeric.: 4
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    5000   10000   10392   15000   25000 
## -------------------------------------------------------- 
## loans$ProsperRating..numeric.: 5
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    6000   10000   11622   15000   35000 
## -------------------------------------------------------- 
## loans$ProsperRating..numeric.: 6
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    5850   10000   11460   15000   35000 
## -------------------------------------------------------- 
## loans$ProsperRating..numeric.: 7
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1000    6000   10940   11584   16000   35000

The graph above shows that, as the number of investors and the loan original amount increases, more high-quality loans of Prosper Ratings of 4-5-6-7 are prevalent.

Delinquencies 7 Years Loan Investors, Prosper Rating Numeric

When looking at the loan original amount and delinquencies the last 7 years, it is no surprise that the majority of points on the graph are those that are employed, even when transparency is added for overplotting. With the “employed” category as “red”, you will notice the prevalence of red dots as the loan amount increases, and the lack of red dots as the delinquencies the last 7 years increase below $5,000 loan amounts.

This means that those who are not categorized as “employed” in the Prosper system have higher delinquencies in the past 7 years, and do not get the same level of loan as those who are employed.

Multivariate Analysis

Talk about some of the relationships you observed in this part of the
investigation. Were there features that strengthened each other in terms of
looking at your feature(s) of interest?

In this section, we wanted to look closer at what categories may be determining loan amounts, the amount of investors, and the delinquencies in the last 7 years. We noticed that those who were not categorized as “employed” saw themselves receiving loans of roughly less than 5,000. Those same individuals were likely to have a higher rate of delinquencies in the last 7 years than those categorized as “employed”, and their loan amounts were lesser than their employed counterparts even if the delinquencies in the last 7 years were the same.

We also discovered that Prospers numeric rating runs from 1-7, with 7 being the most trustworthy rating.

Were there any interesting or surprising interactions between features?

I was overall surprised to see that the number of investors increases as credit score and prosper rating increases. It was my inclination that if a loan were to be deemed safe or less risky, more investors would provide higher individual amounts towards that loan, thus, lowering the total amount of investors per loan.

OPTIONAL: Did you create any models with your dataset? Discuss the
strengths and limitations of your model.

We did not create any models with this dataset.


Final Plots and Summary

Plot One

Description One

The plot above shows the Loan Original Amount on the y-axis, the number of delinquencies in the last 7 years on the x-axis, with each plot indicating an employment status as a categorical variable.

This graph shows to me that, should you be employed, you have a marginal chance of receiving a loan from Prosper. Though the large majority of loans are at 0 delinquencies, it is important to notice that there are a great number of loans provided to people who have be delinquent dozens of times. You are more likely to receive a greater loan amount, however, if your employment status is “employed”

Plot Two

Description Two

The graph above shows the number of investors per loan on the x-axis, the total loan original amount on the y-axis, with the categorical variable of credit score bucket. This plot shows that, as the loan amount increases, the number of investors increases as well.

We already know from prior graphs that higher loan amounts are more likely to be people employed with no delinquencies on their account, couple that with higher credit scores and higher loan amounts are generally more safe. We see that, as loan amounts increase, the number of investors per loan also increases.

This, to me, is a strange phenomenmon, as I would have guessed that, if a loan were to be less risky, investors would invest greater amounts towards that loan, thus limiting the total amount of investors per loan.

The plot shows borrowers with credit scores between 700-900 are generally receiving more investors at higher loan amounts than other credit scores.

Plot Three

Description Three

The above plot shows the estimated return per loan on the x-axis, with the loan original amount on the y-axis and the credit score bucket as the categorical variable.

This plot shows that estimated returns will generally fall between .05 and .15, are nearly normally distributed, and will be somewhat dependent on the credit score of the borrower. Borrowers with higher credit scores are shown to provide less returns to investors, likely due to lower interest rates and ability to pay back a loan quickly.

The plot also shows that estimated returns for higher credit score borrowers are largely the same regardless of the loan amount, showing that loan amount does not really affect return rate, and vice versa. ——

Reflection

The Prosper data was quite large and there were a number of variables that were omitted from this analysis simply due to time and expectations from the course. With greater time, one could examine the areas such as investment from friends to see if certain credit score users or certain Prosper loans are more likely to get loans from their friends.

It was interesting to note that there are a higher number of investors per loan, particularly when the credit score of the borrower is high. This goes against my intution that single or a handful of investors would want to buy out a loan completely. This potentially may show some skepticism of the Prosper platform.

I did not conduct any analysis of loan variables over time, as my questions were more focused on what affects a loan amount and the amount of investors investing in said loan. It may be interesting to explore whether estimated returns lowered or expanded overtime for Prosper loans.